Rank in Wordlist | Word | Rank in Wordlist | Word |
---|---|---|---|
1 | de | 26 | coma |
2 | la | 27 | o |
3 | e | 28 | pas |
4 | lo | 29 | foguèt |
5 | a | 30 | sus |
6 | en | 31 | En |
7 | per | 32 | dei |
8 | que | 33 | d'un |
9 | un | 34 | tanben |
10 | es | 35 | entre |
11 | del | 36 | sa |
12 | una | 37 | fòrça |
13 | las | 38 | d'una |
14 | son | 39 | èra |
15 | dins | 40 | Los |
16 | se | 41 | Es |
17 | los | 42 | nom |
18 | Lo | 43 | au |
19 | La | 44 | cap |
20 | mai | 45 | s |
21 | al | 46 | sègle |
22 | dels | 47 | A |
23 | amb | 48 | Pasmens |
24 | lei | 49 | i |
25 | dau | 50 | leis |
The table shows the top-50 words of the corpus. Usually we see stopwords.
Language: Afrikaans
This list is a good candidate for a first stopword list for a language.
Usually a small, balanced corpus is enough to get a good list of high frequent words. But if the small corpus has some very prominent topic, this will be visible even in the top word lists.
select w_id-100 as rank_in_wordlist, word from words where w_id>100 order by w_id limit 50;
3.4 Sample words for different frequency ranges